Overview

Dataset statistics

Number of variables12
Number of observations5441488
Missing cells498145
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory498.2 MiB
Average record size in memory96.0 B

Variable types

Numeric7
Categorical5

Alerts

filename has a high cardinality: 2240 distinct values High cardinality
sha256 has a high cardinality: 855521 distinct values High cardinality
imp_hash has a high cardinality: 147702 distinct values High cardinality
sec_md5 has a high cardinality: 1611716 distinct values High cardinality
sec_name has a high cardinality: 19779 distinct values High cardinality
Unnamed: 0 is highly correlated with win_countHigh correlation
win_count is highly correlated with Unnamed: 0High correlation
sec_chi2 is highly correlated with raw_sizeHigh correlation
sec_entropy is highly correlated with raw_size and 1 other fieldsHigh correlation
raw_size is highly correlated with sec_chi2 and 2 other fieldsHigh correlation
virtual_size is highly correlated with sec_entropy and 1 other fieldsHigh correlation
Unnamed: 0 is highly correlated with win_countHigh correlation
win_count is highly correlated with Unnamed: 0High correlation
Unnamed: 0 is highly correlated with win_countHigh correlation
win_count is highly correlated with Unnamed: 0High correlation
sec_chi2 is highly correlated with raw_sizeHigh correlation
sec_entropy is highly correlated with raw_sizeHigh correlation
raw_size is highly correlated with sec_chi2 and 2 other fieldsHigh correlation
virtual_size is highly correlated with raw_sizeHigh correlation
Unnamed: 0 is highly correlated with win_countHigh correlation
win_count is highly correlated with Unnamed: 0High correlation
imp_hash has 469426 (8.6%) missing values Missing
sec_chi2 is highly skewed (γ1 = 132.3276782) Skewed
raw_size is highly skewed (γ1 = 267.9088872) Skewed
virtual_size is highly skewed (γ1 = 249.1878453) Skewed
virtual_address is highly skewed (γ1 = 62.57661845) Skewed
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
sec_entropy has 819427 (15.1%) zeros Zeros
raw_size has 564952 (10.4%) zeros Zeros

Reproduction

Analysis started2022-08-08 01:37:53.468771
Analysis finished2022-08-08 01:40:43.795211
Duration2 minutes and 50.33 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct5441488
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2720743.5
Minimum0
Maximum5441487
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size41.5 MiB
2022-08-08T11:40:44.052670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile272074.35
Q11360371.75
median2720743.5
Q34081115.25
95-th percentile5169412.65
Maximum5441487
Range5441487
Interquartile range (IQR)2720743.5

Descriptive statistics

Standard deviation1570822.425
Coefficient of variation (CV)0.5773504283
Kurtosis-1.2
Mean2720743.5
Median Absolute Deviation (MAD)1360372
Skewness1.069900686 × 10-15
Sum1.480489311 × 1013
Variance2.467483091 × 1012
MonotonicityStrictly increasing
2022-08-08T11:40:44.348556image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
36276661
 
< 0.1%
36276641
 
< 0.1%
36276631
 
< 0.1%
36276621
 
< 0.1%
36276611
 
< 0.1%
36276601
 
< 0.1%
36276591
 
< 0.1%
36276581
 
< 0.1%
36276571
 
< 0.1%
Other values (5441478)5441478
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
54414871
< 0.1%
54414861
< 0.1%
54414851
< 0.1%
54414841
< 0.1%
54414831
< 0.1%
54414821
< 0.1%
54414811
< 0.1%
54414801
< 0.1%
54414791
< 0.1%
54414781
< 0.1%

filename
Categorical

HIGH CARDINALITY

Distinct2240
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.5 MiB
2022042601/2022042601_4
 
11887
2022042501/2022042501_7
 
11512
2022042600/2022042600_52
 
11173
2022042600/2022042600_49
 
11070
2022042501/2022042501_6
 
10947
Other values (2235)
5384899 

Length

Max length24
Median length24
Mean length23.82759899
Min length23

Characters and Unicode

Total characters129657594
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022042404/2022042404_42
2nd row2022042404/2022042404_42
3rd row2022042404/2022042404_42
4th row2022042404/2022042404_42
5th row2022042404/2022042404_42

Common Values

ValueCountFrequency (%)
2022042601/2022042601_411887
 
0.2%
2022042501/2022042501_711512
 
0.2%
2022042600/2022042600_5211173
 
0.2%
2022042600/2022042600_4911070
 
0.2%
2022042501/2022042501_610947
 
0.2%
2022042600/2022042600_5110696
 
0.2%
2022042501/2022042501_510502
 
0.2%
2022042501/2022042501_810387
 
0.2%
2022042601/2022042601_310281
 
0.2%
2022042500/2022042500_569887
 
0.2%
Other values (2230)5333146
98.0%

Length

2022-08-08T11:40:44.520217image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022042601/2022042601_411887
 
0.2%
2022042501/2022042501_711512
 
0.2%
2022042600/2022042600_5211173
 
0.2%
2022042600/2022042600_4911070
 
0.2%
2022042501/2022042501_610947
 
0.2%
2022042600/2022042600_5110696
 
0.2%
2022042501/2022042501_510502
 
0.2%
2022042501/2022042501_810387
 
0.2%
2022042601/2022042601_310281
 
0.2%
2022042500/2022042500_569887
 
0.2%
Other values (2230)5333146
98.0%

Most occurring characters

ValueCountFrequency (%)
247061339
36.3%
030974283
23.9%
418268536
 
14.1%
56377301
 
4.9%
15485215
 
4.2%
/5441488
 
4.2%
_5441488
 
4.2%
64891660
 
3.8%
32366854
 
1.8%
71215620
 
0.9%
Other values (2)2133810
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number118774618
91.6%
Other Punctuation5441488
 
4.2%
Connector Punctuation5441488
 
4.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
247061339
39.6%
030974283
26.1%
418268536
 
15.4%
56377301
 
5.4%
15485215
 
4.6%
64891660
 
4.1%
32366854
 
2.0%
71215620
 
1.0%
91130226
 
1.0%
81003584
 
0.8%
Other Punctuation
ValueCountFrequency (%)
/5441488
100.0%
Connector Punctuation
ValueCountFrequency (%)
_5441488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common129657594
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
247061339
36.3%
030974283
23.9%
418268536
 
14.1%
56377301
 
4.9%
15485215
 
4.2%
/5441488
 
4.2%
_5441488
 
4.2%
64891660
 
3.8%
32366854
 
1.8%
71215620
 
0.9%
Other values (2)2133810
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII129657594
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
247061339
36.3%
030974283
23.9%
418268536
 
14.1%
56377301
 
4.9%
15485215
 
4.2%
/5441488
 
4.2%
_5441488
 
4.2%
64891660
 
3.8%
32366854
 
1.8%
71215620
 
0.9%
Other values (2)2133810
 
1.6%

win_count
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct644138
Distinct (%)11.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean282839.2168
Minimum1
Maximum654977
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size41.5 MiB
2022-08-08T11:40:44.685449image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile26637
Q1134859
median269821
Q3401533
95-th percentile604726
Maximum654977
Range654976
Interquartile range (IQR)266674

Descriptive statistics

Standard deviation176525.8264
Coefficient of variation (CV)0.624120758
Kurtosis-0.874147104
Mean282839.2168
Median Absolute Deviation (MAD)133395
Skewness0.3200546027
Sum1.539066204 × 1012
Variance3.116136738 × 1010
MonotonicityNot monotonic
2022-08-08T11:40:44.864234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20654668
 
< 0.1%
20157065
 
< 0.1%
9764263
 
< 0.1%
35630262
 
< 0.1%
22527962
 
< 0.1%
25479161
 
< 0.1%
35661460
 
< 0.1%
35555860
 
< 0.1%
36180160
 
< 0.1%
25699660
 
< 0.1%
Other values (644128)5440867
> 99.9%
ValueCountFrequency (%)
16
 
< 0.1%
210
< 0.1%
315
< 0.1%
417
< 0.1%
56
 
< 0.1%
612
< 0.1%
78
< 0.1%
814
< 0.1%
96
 
< 0.1%
1010
< 0.1%
ValueCountFrequency (%)
6549777
< 0.1%
6549763
< 0.1%
6549752
 
< 0.1%
6549743
< 0.1%
6549733
< 0.1%
6549725
< 0.1%
6549715
< 0.1%
6549703
< 0.1%
6549691
 
< 0.1%
6549687
< 0.1%

sha256
Categorical

HIGH CARDINALITY

Distinct855521
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Memory size41.5 MiB
fc8cc9c54ec536a6cb66d2fb7ce8f17dcc1b2de870be607d9337842eccc044ba
 
876
7fe5f6e0a05608a00acfedce0cd83c74de3813ed5910a53ae2679f985dcaf256
 
800
77980723f53e66234368e2db43fda4e640fcfae134dfdd57c62fb50fd53b2273
 
744
6d3fcefcf9130c19b1ae9538e8870c70c68903a20e7650a3038123bea0df7997
 
737
8af30534bd08c8caf960bae0e9fccb8d8e43428678bd771fdc7a5b55d85677ae
 
720
Other values (855516)
5437611 

Length

Max length64
Median length64
Mean length64
Min length64

Characters and Unicode

Total characters348255232
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10548 ?
Unique (%)0.2%

Sample

1st row6db890d533348b90363936295036e0e1063172632d1e5004919d6cf4c43383ea
2nd row6db890d533348b90363936295036e0e1063172632d1e5004919d6cf4c43383ea
3rd row6db890d533348b90363936295036e0e1063172632d1e5004919d6cf4c43383ea
4th rowccd698a9d99763afd58a04cbd9b6d01612d6e8b15caf64b6034d31ce2920ca9e
5th rowccd698a9d99763afd58a04cbd9b6d01612d6e8b15caf64b6034d31ce2920ca9e

Common Values

ValueCountFrequency (%)
fc8cc9c54ec536a6cb66d2fb7ce8f17dcc1b2de870be607d9337842eccc044ba876
 
< 0.1%
7fe5f6e0a05608a00acfedce0cd83c74de3813ed5910a53ae2679f985dcaf256800
 
< 0.1%
77980723f53e66234368e2db43fda4e640fcfae134dfdd57c62fb50fd53b2273744
 
< 0.1%
6d3fcefcf9130c19b1ae9538e8870c70c68903a20e7650a3038123bea0df7997737
 
< 0.1%
8af30534bd08c8caf960bae0e9fccb8d8e43428678bd771fdc7a5b55d85677ae720
 
< 0.1%
d18aa84b7bf0efde9c6b5db2a38ab1ec9484c59c5284c0bd080f5197bf9388b0714
 
< 0.1%
74e0c5d03137d87fcb57f8bb3f2e16e6a540ee02acf11f9f38c688d4ce9ee65c608
 
< 0.1%
e4632c0d05147e90c57b2a86d38a624a42fd337364b4eb9a4174a6ff919d2d2b600
 
< 0.1%
9d5b4887dd3166f6284b3220de0c77136f3dc795f4d7e40711472f9f24b390f4592
 
< 0.1%
8f6cc96686e671bc8f2d980f39ffe125517c4ce2407755289e52b19cdaee9961584
 
< 0.1%
Other values (855511)5434513
99.9%

Length

2022-08-08T11:40:45.045824image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fc8cc9c54ec536a6cb66d2fb7ce8f17dcc1b2de870be607d9337842eccc044ba876
 
< 0.1%
7fe5f6e0a05608a00acfedce0cd83c74de3813ed5910a53ae2679f985dcaf256800
 
< 0.1%
77980723f53e66234368e2db43fda4e640fcfae134dfdd57c62fb50fd53b2273744
 
< 0.1%
6d3fcefcf9130c19b1ae9538e8870c70c68903a20e7650a3038123bea0df7997737
 
< 0.1%
8af30534bd08c8caf960bae0e9fccb8d8e43428678bd771fdc7a5b55d85677ae720
 
< 0.1%
d18aa84b7bf0efde9c6b5db2a38ab1ec9484c59c5284c0bd080f5197bf9388b0714
 
< 0.1%
74e0c5d03137d87fcb57f8bb3f2e16e6a540ee02acf11f9f38c688d4ce9ee65c608
 
< 0.1%
e4632c0d05147e90c57b2a86d38a624a42fd337364b4eb9a4174a6ff919d2d2b600
 
< 0.1%
9d5b4887dd3166f6284b3220de0c77136f3dc795f4d7e40711472f9f24b390f4592
 
< 0.1%
8f6cc96686e671bc8f2d980f39ffe125517c4ce2407755289e52b19cdaee9961584
 
< 0.1%
Other values (855511)5434513
99.9%

Most occurring characters

ValueCountFrequency (%)
021872774
 
6.3%
621829713
 
6.3%
c21789362
 
6.3%
221781361
 
6.3%
421775371
 
6.3%
f21774608
 
6.3%
321767438
 
6.3%
121765881
 
6.2%
b21748766
 
6.2%
721747877
 
6.2%
Other values (6)130402081
37.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number217730936
62.5%
Lowercase Letter130524296
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
021872774
10.0%
621829713
10.0%
221781361
10.0%
421775371
10.0%
321767438
10.0%
121765881
10.0%
721747877
10.0%
521744770
10.0%
921734287
10.0%
821711464
10.0%
Lowercase Letter
ValueCountFrequency (%)
c21789362
16.7%
f21774608
16.7%
b21748766
16.7%
e21743736
16.7%
a21739083
16.7%
d21728741
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common217730936
62.5%
Latin130524296
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
021872774
10.0%
621829713
10.0%
221781361
10.0%
421775371
10.0%
321767438
10.0%
121765881
10.0%
721747877
10.0%
521744770
10.0%
921734287
10.0%
821711464
10.0%
Latin
ValueCountFrequency (%)
c21789362
16.7%
f21774608
16.7%
b21748766
16.7%
e21743736
16.7%
a21739083
16.7%
d21728741
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII348255232
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
021872774
 
6.3%
621829713
 
6.3%
c21789362
 
6.3%
221781361
 
6.3%
421775371
 
6.3%
f21774608
 
6.3%
321767438
 
6.3%
121765881
 
6.2%
b21748766
 
6.2%
721747877
 
6.2%
Other values (6)130402081
37.4%

imp_hash
Categorical

HIGH CARDINALITY
MISSING

Distinct147702
Distinct (%)3.0%
Missing469426
Missing (%)8.6%
Memory size41.5 MiB
431cb9bbc479c64cb0d873043f4de547
 
256948
dae02f32a21e03ce65412f6e56942daa
 
199692
9dc46f318397655dea2844d0fd08e2ab
 
166816
73effd46557538d5fa5561eee3ffc59c
 
149958
835a0f00bf1f2c5420f77cabc26e254c
 
144560
Other values (147697)
4054088 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters159105984
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50 ?
Unique (%)< 0.1%

Sample

1st rowf34d5f2d4577ed6d9ceec516c1f5a744
2nd rowf34d5f2d4577ed6d9ceec516c1f5a744
3rd rowf34d5f2d4577ed6d9ceec516c1f5a744
4th row08121d2e08520cab5e5c4384900e0af4
5th row08121d2e08520cab5e5c4384900e0af4

Common Values

ValueCountFrequency (%)
431cb9bbc479c64cb0d873043f4de547256948
 
4.7%
dae02f32a21e03ce65412f6e56942daa199692
 
3.7%
9dc46f318397655dea2844d0fd08e2ab166816
 
3.1%
73effd46557538d5fa5561eee3ffc59c149958
 
2.8%
835a0f00bf1f2c5420f77cabc26e254c144560
 
2.7%
359d89624a26d1e756c3e9d6782d6eb0102786
 
1.9%
f34d5f2d4577ed6d9ceec516c1f5a74492117
 
1.7%
d66b543d0999c7628a55690ef9b1c96e85956
 
1.6%
3a2003ea545fe942681da9e7683ebb5881866
 
1.5%
ff0dfa05658a149b7b21130a1a8daedb72611
 
1.3%
Other values (147692)3618752
66.5%
(Missing)469426
 
8.6%

Length

2022-08-08T11:40:45.198924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
431cb9bbc479c64cb0d873043f4de547256948
 
5.2%
dae02f32a21e03ce65412f6e56942daa199692
 
4.0%
9dc46f318397655dea2844d0fd08e2ab166816
 
3.4%
73effd46557538d5fa5561eee3ffc59c149958
 
3.0%
835a0f00bf1f2c5420f77cabc26e254c144560
 
2.9%
359d89624a26d1e756c3e9d6782d6eb0102786
 
2.1%
f34d5f2d4577ed6d9ceec516c1f5a74492117
 
1.9%
d66b543d0999c7628a55690ef9b1c96e85956
 
1.7%
3a2003ea545fe942681da9e7683ebb5881866
 
1.6%
ff0dfa05658a149b7b21130a1a8daedb72611
 
1.5%
Other values (147692)3618752
72.8%

Most occurring characters

ValueCountFrequency (%)
511088010
 
7.0%
410863738
 
6.8%
e10784140
 
6.8%
610565653
 
6.6%
c10437834
 
6.6%
310225295
 
6.4%
f10107712
 
6.4%
29963985
 
6.3%
d9937351
 
6.2%
99764846
 
6.1%
Other values (6)55367420
34.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number99355201
62.4%
Lowercase Letter59750783
37.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
511088010
11.2%
410863738
10.9%
610565653
10.6%
310225295
10.3%
29963985
10.0%
99764846
9.8%
79348658
9.4%
89345626
9.4%
09344467
9.4%
18844923
8.9%
Lowercase Letter
ValueCountFrequency (%)
e10784140
18.0%
c10437834
17.5%
f10107712
16.9%
d9937351
16.6%
a9342758
15.6%
b9140988
15.3%

Most occurring scripts

ValueCountFrequency (%)
Common99355201
62.4%
Latin59750783
37.6%

Most frequent character per script

Common
ValueCountFrequency (%)
511088010
11.2%
410863738
10.9%
610565653
10.6%
310225295
10.3%
29963985
10.0%
99764846
9.8%
79348658
9.4%
89345626
9.4%
09344467
9.4%
18844923
8.9%
Latin
ValueCountFrequency (%)
e10784140
18.0%
c10437834
17.5%
f10107712
16.9%
d9937351
16.6%
a9342758
15.6%
b9140988
15.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII159105984
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
511088010
 
7.0%
410863738
 
6.8%
e10784140
 
6.8%
610565653
 
6.6%
c10437834
 
6.6%
310225295
 
6.4%
f10107712
 
6.4%
29963985
 
6.3%
d9937351
 
6.2%
99764846
 
6.1%
Other values (6)55367420
34.8%

sec_chi2
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1340053
Distinct (%)24.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4377733.923
Minimum-1
Maximum4.067239526 × 1010
Zeros0
Zeros (%)0.0%
Negative685254
Negative (%)12.6%
Memory size41.5 MiB
2022-08-08T11:40:45.364900image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-1
Q152817
median135017.84
Q3916500.63
95-th percentile8820111.6
Maximum4.067239526 × 1010
Range4.067239526 × 1010
Interquartile range (IQR)863683.63

Descriptive statistics

Standard deviation74580140.73
Coefficient of variation (CV)17.03624342
Kurtosis44535.73396
Mean4377733.923
Median Absolute Deviation (MAD)135018.84
Skewness132.3276782
Sum2.382138661 × 1013
Variance5.562197391 × 1015
MonotonicityNot monotonic
2022-08-08T11:40:45.543322image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1685254
 
12.6%
104448051800
 
1.0%
12852248943
 
0.9%
12500145249
 
0.8%
12801537733
 
0.7%
13056035121
 
0.6%
13004930540
 
0.6%
512492.8824313
 
0.4%
12500322858
 
0.4%
917905.1922284
 
0.4%
Other values (1340043)4437393
81.5%
ValueCountFrequency (%)
-1685254
12.6%
0.021
 
< 0.1%
101
 
< 0.1%
111
 
< 0.1%
50.942
 
< 0.1%
511
 
< 0.1%
551
 
< 0.1%
57.23
 
< 0.1%
59.671
 
< 0.1%
63.881
 
< 0.1%
ValueCountFrequency (%)
4.067239526 × 10101
< 0.1%
3.94446848 × 10101
< 0.1%
3.362373632 × 10101
< 0.1%
1.761037517 × 10101
< 0.1%
1.702766285 × 10101
< 0.1%
1.616584192 × 10101
< 0.1%
1.484128256 × 10101
< 0.1%
1.2630528 × 10101
< 0.1%
1.219493683 × 10101
< 0.1%
1.090076058 × 10101
< 0.1%

sec_entropy
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct801
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.749246043
Minimum0
Maximum8
Zeros819427
Zeros (%)15.1%
Negative0
Negative (%)0.0%
Memory size41.5 MiB
2022-08-08T11:40:45.721358image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11.02
median4.28
Q35.87
95-th percentile7.58
Maximum8
Range8
Interquartile range (IQR)4.85

Descriptive statistics

Standard deviation2.531273833
Coefficient of variation (CV)0.6751420963
Kurtosis-1.225028874
Mean3.749246043
Median Absolute Deviation (MAD)1.94
Skewness-0.2448234118
Sum20401477.35
Variance6.407347217
MonotonicityNot monotonic
2022-08-08T11:40:45.901698image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0819427
 
15.1%
0.278902
 
1.5%
0.0852866
 
1.0%
851069
 
0.9%
5.6543476
 
0.8%
0.143338
 
0.8%
6.6237436
 
0.7%
0.0232960
 
0.6%
6.4229902
 
0.5%
2.629037
 
0.5%
Other values (791)4223075
77.6%
ValueCountFrequency (%)
0819427
15.1%
0.016799
 
0.1%
0.0232960
 
0.6%
0.031616
 
< 0.1%
0.042220
 
< 0.1%
0.052044
 
< 0.1%
0.0610552
 
0.2%
0.072458
 
< 0.1%
0.0852866
 
1.0%
0.091206
 
< 0.1%
ValueCountFrequency (%)
851069
0.9%
7.9922004
0.4%
7.9815229
 
0.3%
7.9710902
 
0.2%
7.967676
 
0.1%
7.9511293
 
0.2%
7.947212
 
0.1%
7.939126
 
0.2%
7.925413
 
0.1%
7.9121187
0.4%

sec_md5
Categorical

HIGH CARDINALITY

Distinct1611716
Distinct (%)29.6%
Missing0
Missing (%)0.0%
Memory size41.5 MiB
d41d8cd98f00b204e9800998ecf8427e
685254 
620f0b67a91f7f74151bc5be745b7110
 
51788
bf619eac0cdf3f68d496ea9344137e8b
 
35014
1f354d76203061bfdd5a53dae48d5435
 
25436
89ce79b3a1e62aeb2ea80a5018651a44
 
24239
Other values (1611711)
4619757 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters174127616
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1318410 ?
Unique (%)24.2%

Sample

1st row96d952c4260892639b9ca8ec50185446
2nd row95238f87f0624c55c30bfe3af3861167
3rd row7592e45bb73001ecdcec6e157063dfb7
4th row9db1dfcdcdff708f96b0aa401c287f5c
5th row0a8e2d5022e80b1b7cd8d523fcbe5c8b

Common Values

ValueCountFrequency (%)
d41d8cd98f00b204e9800998ecf8427e685254
 
12.6%
620f0b67a91f7f74151bc5be745b711051788
 
1.0%
bf619eac0cdf3f68d496ea9344137e8b35014
 
0.6%
1f354d76203061bfdd5a53dae48d543525436
 
0.5%
89ce79b3a1e62aeb2ea80a5018651a4424239
 
0.4%
7e016ed8299b52ab729134bb6f3806a022210
 
0.4%
760079e225862788803e3a1d9dfc36bb20852
 
0.4%
353210bc3dfdfeef53549dd6ae74edf220852
 
0.4%
1d7f1632a90ccd89571fb8adcbe2682920852
 
0.4%
874e310a66a2ab59cfc5389b39f0503d20852
 
0.4%
Other values (1611706)4514139
83.0%

Length

2022-08-08T11:40:46.108212image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
d41d8cd98f00b204e9800998ecf8427e685254
 
12.6%
620f0b67a91f7f74151bc5be745b711051788
 
1.0%
bf619eac0cdf3f68d496ea9344137e8b35014
 
0.6%
1f354d76203061bfdd5a53dae48d543525436
 
0.5%
89ce79b3a1e62aeb2ea80a5018651a4424239
 
0.4%
7e016ed8299b52ab729134bb6f3806a022210
 
0.4%
760079e225862788803e3a1d9dfc36bb20852
 
0.4%
353210bc3dfdfeef53549dd6ae74edf220852
 
0.4%
1d7f1632a90ccd89571fb8adcbe2682920852
 
0.4%
874e310a66a2ab59cfc5389b39f0503d20852
 
0.4%
Other values (1611706)4514139
83.0%

Most occurring characters

ValueCountFrequency (%)
813002484
 
7.5%
012862497
 
7.4%
912369450
 
7.1%
411799567
 
6.8%
e11725826
 
6.7%
d11516179
 
6.6%
f11043508
 
6.3%
c10702576
 
6.1%
210675356
 
6.1%
b10310459
 
5.9%
Other values (6)58119714
33.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number109102071
62.7%
Lowercase Letter65025545
37.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
813002484
11.9%
012862497
11.8%
912369450
11.3%
411799567
10.8%
210675356
9.8%
710173094
9.3%
110086116
9.2%
39539242
8.7%
69329693
8.6%
59264572
8.5%
Lowercase Letter
ValueCountFrequency (%)
e11725826
18.0%
d11516179
17.7%
f11043508
17.0%
c10702576
16.5%
b10310459
15.9%
a9726997
15.0%

Most occurring scripts

ValueCountFrequency (%)
Common109102071
62.7%
Latin65025545
37.3%

Most frequent character per script

Common
ValueCountFrequency (%)
813002484
11.9%
012862497
11.8%
912369450
11.3%
411799567
10.8%
210675356
9.8%
710173094
9.3%
110086116
9.2%
39539242
8.7%
69329693
8.6%
59264572
8.5%
Latin
ValueCountFrequency (%)
e11725826
18.0%
d11516179
17.7%
f11043508
17.0%
c10702576
16.5%
b10310459
15.9%
a9726997
15.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII174127616
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
813002484
 
7.5%
012862497
 
7.4%
912369450
 
7.1%
411799567
 
6.8%
e11725826
 
6.7%
d11516179
 
6.6%
f11043508
 
6.3%
c10702576
 
6.1%
210675356
 
6.1%
b10310459
 
5.9%
Other values (6)58119714
33.4%

raw_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct30960
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean290402.7543
Minimum0
Maximum4278276096
Zeros564952
Zeros (%)10.4%
Negative0
Negative (%)0.0%
Memory size41.5 MiB
2022-08-08T11:40:46.274341image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1512
median4096
Q341984
95-th percentile774144
Maximum4278276096
Range4278276096
Interquartile range (IQR)41472

Descriptive statistics

Standard deviation4527451.355
Coefficient of variation (CV)15.59024936
Kurtosis174523.3689
Mean290402.7543
Median Absolute Deviation (MAD)4096
Skewness267.9088872
Sum1.580223103 × 1012
Variance2.049781577 × 1013
MonotonicityNot monotonic
2022-08-08T11:40:46.447937image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
512815884
 
15.0%
0564952
 
10.4%
4096400200
 
7.4%
1024272852
 
5.0%
2048197407
 
3.6%
1536193109
 
3.5%
12288129455
 
2.4%
2560124933
 
2.3%
2048079401
 
1.5%
819276177
 
1.4%
Other values (30950)2587118
47.5%
ValueCountFrequency (%)
0564952
10.4%
16
 
< 0.1%
25
 
< 0.1%
31
 
< 0.1%
42
 
< 0.1%
515
 
< 0.1%
62
 
< 0.1%
712
 
< 0.1%
826
 
< 0.1%
95
 
< 0.1%
ValueCountFrequency (%)
42782760961
 
< 0.1%
19540473481
 
< 0.1%
17010809311
 
< 0.1%
13421772801
 
< 0.1%
116024988616
< 0.1%
11488378881
 
< 0.1%
9882951681
 
< 0.1%
7139993601
 
< 0.1%
7138339841
 
< 0.1%
7138196481
 
< 0.1%

virtual_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct317463
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean373683.8882
Minimum0
Maximum4294961872
Zeros1500
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size41.5 MiB
2022-08-08T11:40:46.619905image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile16
Q11536
median8340
Q361440
95-th percentile1015450
Maximum4294961872
Range4294961872
Interquartile range (IQR)59904

Descriptive statistics

Standard deviation6187310.983
Coefficient of variation (CV)16.55760705
Kurtosis138793.1926
Mean373683.8882
Median Absolute Deviation (MAD)8268
Skewness249.1878453
Sum2.033396393 × 1012
Variance3.82828172 × 1013
MonotonicityNot monotonic
2022-08-08T11:40:46.794241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4096151829
 
2.8%
12118300
 
2.2%
24108873
 
2.0%
890456
 
1.7%
154075690
 
1.4%
51255300
 
1.0%
932854025
 
1.0%
834050697
 
0.9%
5550450550
 
0.9%
18027450540
 
0.9%
Other values (317453)4635228
85.2%
ValueCountFrequency (%)
01500
 
< 0.1%
13010
 
0.1%
27369
 
0.1%
3595
 
< 0.1%
412066
 
0.2%
5311
 
< 0.1%
655
 
< 0.1%
722
 
< 0.1%
890456
1.7%
926716
 
0.5%
ValueCountFrequency (%)
42949618721
< 0.1%
42949445731
< 0.1%
42782187521
< 0.1%
21721901841
< 0.1%
19505892041
< 0.1%
17515151361
< 0.1%
17112711321
< 0.1%
13997674801
< 0.1%
11488378881
< 0.1%
10737818561
< 0.1%

virtual_address
Real number (ℝ≥0)

SKEWED

Distinct24392
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1257902.083
Minimum0
Maximum4278538240
Zeros82
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size41.5 MiB
2022-08-08T11:40:47.054246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4096
Q120480
median86016
Q3405504
95-th percentile3473408
Maximum4278538240
Range4278538240
Interquartile range (IQR)385024

Descriptive statistics

Standard deviation10416917.35
Coefficient of variation (CV)8.281183007
Kurtosis11702.13327
Mean1257902.083
Median Absolute Deviation (MAD)81920
Skewness62.57661845
Sum6.844859089 × 1012
Variance1.085121671 × 1014
MonotonicityNot monotonic
2022-08-08T11:40:47.228057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4096914591
 
16.8%
8192162574
 
3.0%
32768126245
 
2.3%
24576121582
 
2.2%
49152121087
 
2.2%
36864120823
 
2.2%
40960118109
 
2.2%
16384117416
 
2.2%
20480116339
 
2.1%
45056114657
 
2.1%
Other values (24382)3408065
62.6%
ValueCountFrequency (%)
082
< 0.1%
161
 
< 0.1%
2881
 
< 0.1%
39286
< 0.1%
4164
 
< 0.1%
4321
 
< 0.1%
44821
 
< 0.1%
4803
 
< 0.1%
51277
< 0.1%
5282
 
< 0.1%
ValueCountFrequency (%)
42785382401
< 0.1%
35552829441
< 0.1%
28841451521
< 0.1%
19536404481
< 0.1%
19534888961
< 0.1%
19534192641
< 0.1%
17120829441
< 0.1%
17120378881
< 0.1%
14007500801
< 0.1%
10756341761
< 0.1%

sec_name
Categorical

HIGH CARDINALITY

Distinct19779
Distinct (%)0.4%
Missing28719
Missing (%)0.5%
Memory size41.5 MiB
.rsrc
893409 
.text
798340 
.rdata
691743 
.data
678988 
.reloc
594267 
Other values (19774)
1756022 

Length

Max length8
Median length7
Mean length5.251314992
Min length1

Characters and Unicode

Total characters28424155
Distinct characters94
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13358 ?
Unique (%)0.2%

Sample

1st row.text
2nd row.rsrc
3rd row.reloc
4th row.text
5th row.rdata

Common Values

ValueCountFrequency (%)
.rsrc893409
16.4%
.text798340
14.7%
.rdata691743
12.7%
.data678988
12.5%
.reloc594267
10.9%
.idata210899
 
3.9%
.tls169106
 
3.1%
.pdata168938
 
3.1%
.bss92047
 
1.7%
CODE83159
 
1.5%
Other values (19769)1031873
19.0%

Length

2022-08-08T11:40:47.394550image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rsrc893459
16.5%
text799350
14.8%
data762219
14.1%
rdata707954
13.1%
reloc594277
11.0%
idata211452
 
3.9%
bss174144
 
3.2%
tls169125
 
3.1%
pdata168947
 
3.1%
code90382
 
1.7%
Other values (19110)841460
15.5%

Most occurring characters

ValueCountFrequency (%)
.4784717
16.8%
a3803333
13.4%
t3772939
13.3%
r3162062
11.1%
d1981968
7.0%
c1574420
 
5.5%
e1522461
 
5.4%
s1397860
 
4.9%
l901280
 
3.2%
x856865
 
3.0%
Other values (84)4666250
16.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter20904170
73.5%
Other Punctuation4869776
 
17.1%
Uppercase Letter2161006
 
7.6%
Decimal Number451566
 
1.6%
Connector Punctuation28300
 
0.1%
Modifier Symbol5738
 
< 0.1%
Math Symbol1206
 
< 0.1%
Dash Punctuation1127
 
< 0.1%
Open Punctuation545
 
< 0.1%
Close Punctuation529
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a3803333
18.2%
t3772939
18.0%
r3162062
15.1%
d1981968
9.5%
c1574420
7.5%
e1522461
7.3%
s1397860
 
6.7%
l901280
 
4.3%
x856865
 
4.1%
o710803
 
3.4%
Other values (16)1220179
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
S246247
11.4%
A230427
10.7%
P229476
10.6%
U216276
10.0%
D213420
9.9%
X193964
9.0%
T144128
6.7%
E131461
6.1%
C123528
 
5.7%
B104762
 
4.8%
Other values (16)327317
15.1%
Other Punctuation
ValueCountFrequency (%)
.4784717
98.3%
/56578
 
1.2%
#16015
 
0.3%
\9596
 
0.2%
:491
 
< 0.1%
?466
 
< 0.1%
@389
 
< 0.1%
!339
 
< 0.1%
*208
 
< 0.1%
;192
 
< 0.1%
Other values (5)785
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0180722
40.0%
1154751
34.3%
232214
 
7.1%
425547
 
5.7%
512774
 
2.8%
712388
 
2.7%
911827
 
2.6%
38954
 
2.0%
87218
 
1.6%
65171
 
1.1%
Math Symbol
ValueCountFrequency (%)
=368
30.5%
|234
19.4%
+190
15.8%
>184
15.3%
<160
13.3%
~70
 
5.8%
Close Punctuation
ValueCountFrequency (%)
]291
55.0%
}132
25.0%
)106
 
20.0%
Open Punctuation
ValueCountFrequency (%)
{273
50.1%
[157
28.8%
(115
21.1%
Modifier Symbol
ValueCountFrequency (%)
^5570
97.1%
`168
 
2.9%
Connector Punctuation
ValueCountFrequency (%)
_28300
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1127
100.0%
Currency Symbol
ValueCountFrequency (%)
$192
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin23065176
81.1%
Common5358979
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a3803333
16.5%
t3772939
16.4%
r3162062
13.7%
d1981968
8.6%
c1574420
6.8%
e1522461
6.6%
s1397860
 
6.1%
l901280
 
3.9%
x856865
 
3.7%
o710803
 
3.1%
Other values (42)3381185
14.7%
Common
ValueCountFrequency (%)
.4784717
89.3%
0180722
 
3.4%
1154751
 
2.9%
/56578
 
1.1%
232214
 
0.6%
_28300
 
0.5%
425547
 
0.5%
#16015
 
0.3%
512774
 
0.2%
712388
 
0.2%
Other values (32)54973
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII28424155
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.4784717
16.8%
a3803333
13.4%
t3772939
13.3%
r3162062
11.1%
d1981968
7.0%
c1574420
 
5.5%
e1522461
 
5.4%
s1397860
 
4.9%
l901280
 
3.2%
x856865
 
3.0%
Other values (84)4666250
16.4%

Interactions

2022-08-08T11:40:14.757335image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:38.381506image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:44.322518image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:50.246639image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:56.266420image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:02.798478image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:08.908531image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:15.589772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:39.263876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:45.145900image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:51.079672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:57.217811image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:03.641271image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:09.752784image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:16.422702image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:40.109529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:45.987332image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:51.912727image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:58.162442image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:04.471916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:10.595114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:17.282906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:40.984582image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:46.871302image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:52.798785image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:59.119195image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:05.325042image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:11.445251image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:18.113241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:41.815003image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:47.729195image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:53.643591image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:00.071373image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:06.150656image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:12.278680image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:18.943676image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:42.646322image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:48.571261image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:54.482158image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:01.017040image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:06.974160image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:13.088456image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:19.771990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:43.486618image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:49.408383image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:39:55.320522image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:01.961744image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:07.799651image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-08T11:40:13.928697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-08-08T11:40:47.582925image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-08T11:40:47.809227image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-08T11:40:47.969437image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-08T11:40:48.130771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-08-08T11:40:22.234896image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-08T11:40:27.219192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-08T11:40:34.856100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-08-08T11:40:37.979098image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0filenamewin_countsha256imp_hashsec_chi2sec_entropysec_md5raw_sizevirtual_sizevirtual_addresssec_name
002022042404/2022042404_4216db890d533348b90363936295036e0e1063172632d1e5004919d6cf4c43383eaf34d5f2d4577ed6d9ceec516c1f5a7441620180.755.5596d952c4260892639b9ca8ec5018544668608682448192.text
112022042404/2022042404_4216db890d533348b90363936295036e0e1063172632d1e5004919d6cf4c43383eaf34d5f2d4577ed6d9ceec516c1f5a74492516.003.1895238f87f0624c55c30bfe3af38611671024101681920.rsrc
222022042404/2022042404_4216db890d533348b90363936295036e0e1063172632d1e5004919d6cf4c43383eaf34d5f2d4577ed6d9ceec516c1f5a744128015.000.107592e45bb73001ecdcec6e157063dfb75121290112.reloc
332022042404/2022042404_422ccd698a9d99763afd58a04cbd9b6d01612d6e8b15caf64b6034d31ce2920ca9e08121d2e08520cab5e5c4384900e0af45598081.006.619db1dfcdcdff708f96b0aa401c287f5c104601610459704096.text
442022042404/2022042404_422ccd698a9d99763afd58a04cbd9b6d01612d6e8b15caf64b6034d31ce2920ca9e08121d2e08520cab5e5c4384900e0af416481124.004.420a8e2d5022e80b1b7cd8d523fcbe5c8b2560002555781052672.rdata
552022042404/2022042404_422ccd698a9d99763afd58a04cbd9b6d01612d6e8b15caf64b6034d31ce2920ca9e08121d2e08520cab5e5c4384900e0af4907516.444.343cd351664389d308a82e9b28fab9337613312403921310720.data
662022042404/2022042404_422ccd698a9d99763afd58a04cbd9b6d01612d6e8b15caf64b6034d31ce2920ca9e08121d2e08520cab5e5c4384900e0af45571472.505.84eacdcbbb43900e1a6ed94beae9dba1c82452482448041351680.rsrc
772022042404/2022042404_422ccd698a9d99763afd58a04cbd9b6d01612d6e8b15caf64b6034d31ce2920ca9e08121d2e08520cab5e5c4384900e0af43200556.505.244ee8054f0d3d7d70df4dd4f06815c05399328988941597440.reloc
882022042404/2022042404_4236a4ffa7ffb47768fe7c8af5b9b4ea12d5bae52bd2f71019ce8a30f603d0c914c0c57d0c2e73797f4b6ac8c85bc087fa77798533.006.33d9c7cc4d3aabb1f0dbb967ff871b7b5c109260810923644096.text
992022042404/2022042404_4236a4ffa7ffb47768fe7c8af5b9b4ea12d5bae52bd2f71019ce8a30f603d0c914c0c57d0c2e73797f4b6ac8c85bc087fa79199987.005.6589c5d4e2dbe1da73a8e707b0a073946c3225603225081097728.rdata

Last rows

Unnamed: 0filenamewin_countsha256imp_hashsec_chi2sec_entropysec_md5raw_sizevirtual_sizevirtual_addresssec_name
544147854414782022042606/2022042606_7654976d1e820ebd2a447730e3a0b9fc7f879ec03452ef7ee9b8c061848858d1954de94f34d5f2d4577ed6d9ceec516c1f5a74413061.557.9986a5809e22e33c46ea7b8fa07920fd38142131214210928192.text
544147954414792022042606/2022042606_7654976d1e820ebd2a447730e3a0b9fc7f879ec03452ef7ee9b8c061848858d1954de94f34d5f2d4577ed6d9ceec516c1f5a74482026.834.026ed4b01d54c16a1b818758d7904bd658153614221433600.rsrc
544148054414802022042606/2022042606_7654976d1e820ebd2a447730e3a0b9fc7f879ec03452ef7ee9b8c061848858d1954de94f34d5f2d4577ed6d9ceec516c1f5a744128015.000.10f896439e2645c447f6af5850ea388d96512121441792.reloc
544148154414812022042606/2022042606_765497791a8152336629d958079fe167ef34214a8ee1c4053133ee040d5ec1a7951457ee4484cbb4aaf3e3795884c4db70093c41159735.636.76b61aacfe7d30983d2c6bcc96da951ce12590722586914096.text
544148254414822022042606/2022042606_765497791a8152336629d958079fe167ef34214a8ee1c4053133ee040d5ec1a7951457ee4484cbb4aaf3e3795884c4db70093c438386.264.62a08950861fd8c953813518c94999cff415361224266240.text1
544148354414832022042606/2022042606_765497791a8152336629d958079fe167ef34214a8ee1c4053133ee040d5ec1a7951457ee4484cbb4aaf3e3795884c4db70093c41436322.256.36bebe93d763f456646d6571671a3ce6ad7987279548270336.rdata
544148454414842022042606/2022042606_765497791a8152336629d958079fe167ef34214a8ee1c4053133ee040d5ec1a7951457ee4484cbb4aaf3e3795884c4db70093c4984899.253.41c5696133b81c951c807d1ba3288574ab972818504352256.data
544148554414852022042606/2022042606_765497791a8152336629d958079fe167ef34214a8ee1c4053133ee040d5ec1a7951457ee4484cbb4aaf3e3795884c4db70093c4198750.752.80a817516ee30ad025f98d6524e8f2670c20481700372736.data1
544148654414862022042606/2022042606_765497791a8152336629d958079fe167ef34214a8ee1c4053133ee040d5ec1a7951457ee4484cbb4aaf3e3795884c4db70093c476550.985.85962e116ec88fec69f474bbd175e02d4297289304376832.trace
544148754414872022042606/2022042606_765497791a8152336629d958079fe167ef34214a8ee1c4053133ee040d5ec1a7951457ee4484cbb4aaf3e3795884c4db70093c4243924.633.59ba850557e059c68972183354def82c0430722900389120.rsrc